Model Selection

English speech recognition

# English speech recognition

Lightweight audio model, excelling in speech recognition, audio understanding, and executing audio instructions among other diverse tasks

Transformers English

Deepfake Audio Detection

A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, achieving 95.45% accuracy on the evaluation set

Audio Classification

Parakeet Tdt Ctc 1.1b

Parakeet TDT-CTC 1.1B is an automatic speech recognition model capable of transcribing English speech with punctuation and capitalization, jointly developed by NVIDIA NeMo and Suno.ai.

Speech Recognition English

Faster Whisper Medium.en

This is the CTranslate2 converted version of the OpenAI Whisper medium.en model, used for efficient automatic speech recognition tasks.

Speech Recognition English

A sequence-to-sequence model supporting English automatic speech recognition (ASR), capable of outputting normalized text, timestamp annotations, and multi-speaker segmentation.

Speech Recognition

Transformers English

Exp W2v2t En Vp Nl S281

An English speech recognition model fine-tuned based on facebook/wav2vec2-large-nl-voxpopuli, trained using the Common Voice 7.0 training set.

Speech Recognition

Transformers English

Exp W2v2t En No Pretraining S289

This is a model designed for English speech recognition tasks, based on a randomly initialized wav2vec2 architecture and fine-tuned using the Common Voice 7.0 dataset.

Speech Recognition

Transformers English

Wav2vec2 2 Bart Large No Adapter

This model is an automatic speech recognition (ASR) model trained on the LibriSpeech ASR dataset, capable of converting English speech into text.

Speech Recognition

An automatic speech recognition model trained on the LibriSpeech ASR dataset, designed to convert English speech into text.

Speech Recognition

Wav2vec2 Large Xlsr 53 English

An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset

Speech Recognition English

Wav2vec2 Base 10k Voxpopuli Ft En

A Wav2Vec2 base model pre-trained on a 10K unlabeled subset of the VoxPopuli corpus and fine-tuned on English transcription data, suitable for English speech recognition tasks.

Speech Recognition

Transformers English

Unispeech Sat Base Timit Ft

This model is an automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on microsoft/unispeech-sat-base, achieving a word error rate of 41.01% on the evaluation set.

Speech Recognition

patrickvonplaten

Asr Wav2vec2 Commonvoice En

This is an end-to-end automatic speech recognition system trained on the CommonVoice English dataset, combining the wav2vec 2.0 pre-trained model and CTC decoder.

Speech Recognition English

W2v Timit Ft 4001

A speech recognition model based on Wav2Vec 2.0 architecture, fine-tuned on the TIMIT dataset, suitable for English speech-to-text tasks

Speech Recognition

Xlsr En Punctuation

Fine-tuned automatic speech recognition model based on facebook/wav2vec2-large-xlsr-53 on the English Common Voice dataset, supporting punctuation prediction

Speech Recognition English

Wav2vec2 Base Repro Timit

This model is an automatic speech recognition model fine-tuned on the TIMIT_ASR - NA dataset, based on patrickvonplaten/wav2vec2-base-repro-960h-libri-85k-steps.

Speech Recognition

patrickvonplaten

Unispeech Sat Base Plus Timit Ft

An automatic speech recognition (ASR) model fine-tuned on the TIMIT_ASR dataset based on microsoft/unispeech-sat-base-plus

Speech Recognition

patrickvonplaten

Wav2vec2 2 Bert Large No Adapter

An automatic speech recognition (ASR) model trained on the LibriSpeech dataset for converting English speech to text

Speech Recognition

Wav2vec2 Random

An automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on the wav2vec2-base-random model

Speech Recognition

patrickvonplaten

Wav2vec2 Large English

An automatic speech recognition model fine-tuned on English based on facebook/wav2vec2-large, trained using the Common Voice 6.1 dataset

Speech Recognition

Transformers English

Wav2vec2 Xls R 1b English

This is an English speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple English speech datasets.

Speech Recognition

Transformers English

Wav2vec2 2 Bert Large No Adapter Frozen Enc

This model is a speech recognition model trained on the librispeech_asr dataset, achieving a word error rate (WER) of 2.0133 on the evaluation set.

Speech Recognition

Wav2vec2 Large Lv60 Timit Asr

A speech recognition model fine-tuned on the timit_asr dataset based on facebook/wav2vec2-large-lv60

Speech Recognition English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase